415 research outputs found

    Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training

    Full text link
    The use of self-supervised pre-training has emerged as a promising approach to enhance the performance of visual tasks such as image classification. In this context, recent approaches have employed the Masked Image Modeling paradigm, which pre-trains a backbone by reconstructing visual tokens associated with randomly masked image patches. This masking approach, however, introduces noise into the input data during pre-training, leading to discrepancies that can impair performance during the fine-tuning phase. Furthermore, input masking neglects the dependencies between corrupted patches, increasing the inconsistencies observed in downstream fine-tuning tasks. To overcome these issues, we propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT), that employs autoregressive and permuted predictions to capture intra-patch dependencies. In addition, MaPeT employs auxiliary positional information to reduce the disparity between the pre-training and fine-tuning phases. In our experiments, we employ a fair setting to ensure reliable and meaningful comparisons and conduct investigations on multiple visual tokenizers, including our proposed kk-CLIP which directly employs discretized CLIP features. Our results demonstrate that MaPeT achieves competitive performance on ImageNet, compared to baselines and competitors under the same model setting. Source code and trained models are publicly available at: https://github.com/aimagelab/MaPeT

    Pre-processing, classification and semantic querying of large-scale Earth observation spaceborne/airborne/terrestrial image databases: Process and product innovations.

    Get PDF
    By definition of Wikipedia, “big data is the term adopted for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The big data challenges typically include capture, curation, storage, search, sharing, transfer, analysis and visualization”. Proposed by the intergovernmental Group on Earth Observations (GEO), the visionary goal of the Global Earth Observation System of Systems (GEOSS) implementation plan for years 2005-2015 is systematic transformation of multisource Earth Observation (EO) “big data” into timely, comprehensive and operational EO value-adding products and services, submitted to the GEO Quality Assurance Framework for Earth Observation (QA4EO) calibration/validation (Cal/Val) requirements. To date the GEOSS mission cannot be considered fulfilled by the remote sensing (RS) community. This is tantamount to saying that past and existing EO image understanding systems (EO-IUSs) have been outpaced by the rate of collection of EO sensory big data, whose quality and quantity are ever-increasing. This true-fact is supported by several observations. For example, no European Space Agency (ESA) EO Level 2 product has ever been systematically generated at the ground segment. By definition, an ESA EO Level 2 product comprises a single-date multi-spectral (MS) image radiometrically calibrated into surface reflectance (SURF) values corrected for geometric, atmospheric, adjacency and topographic effects, stacked with its data-derived scene classification map (SCM), whose thematic legend is general-purpose, user- and application-independent and includes quality layers, such as cloud and cloud-shadow. Since no GEOSS exists to date, present EO content-based image retrieval (CBIR) systems lack EO image understanding capabilities. Hence, no semantic CBIR (SCBIR) system exists to date either, where semantic querying is synonym of semantics-enabled knowledge/information discovery in multi-source big image databases. In set theory, if set A is a strict superset of (or strictly includes) set B, then A B. This doctoral project moved from the working hypothesis that SCBIR computer vision (CV), where vision is synonym of scene-from-image reconstruction and understanding EO image understanding (EO-IU) in operating mode, synonym of GEOSS ESA EO Level 2 product human vision. Meaning that necessary not sufficient pre-condition for SCBIR is CV in operating mode, this working hypothesis has two corollaries. First, human visual perception, encompassing well-known visual illusions such as Mach bands illusion, acts as lower bound of CV within the multi-disciplinary domain of cognitive science, i.e., CV is conditioned to include a computational model of human vision. Second, a necessary not sufficient pre-condition for a yet-unfulfilled GEOSS development is systematic generation at the ground segment of ESA EO Level 2 product. Starting from this working hypothesis the overarching goal of this doctoral project was to contribute in research and technical development (R&D) toward filling an analytic and pragmatic information gap from EO big sensory data to EO value-adding information products and services. This R&D objective was conceived to be twofold. First, to develop an original EO-IUS in operating mode, synonym of GEOSS, capable of systematic ESA EO Level 2 product generation from multi-source EO imagery. EO imaging sources vary in terms of: (i) platform, either spaceborne, airborne or terrestrial, (ii) imaging sensor, either: (a) optical, encompassing radiometrically calibrated or uncalibrated images, panchromatic or color images, either true- or false color red-green-blue (RGB), multi-spectral (MS), super-spectral (SS) or hyper-spectral (HS) images, featuring spatial resolution from low (> 1km) to very high (< 1m), or (b) synthetic aperture radar (SAR), specifically, bi-temporal RGB SAR imagery. The second R&D objective was to design and develop a prototypical implementation of an integrated closed-loop EO-IU for semantic querying (EO-IU4SQ) system as a GEOSS proof-of-concept in support of SCBIR. The proposed closed-loop EO-IU4SQ system prototype consists of two subsystems for incremental learning. A primary (dominant, necessary not sufficient) hybrid (combined deductive/top-down/physical model-based and inductive/bottom-up/statistical model-based) feedback EO-IU subsystem in operating mode requires no human-machine interaction to automatically transform in linear time a single-date MS image into an ESA EO Level 2 product as initial condition. A secondary (dependent) hybrid feedback EO Semantic Querying (EO-SQ) subsystem is provided with a graphic user interface (GUI) to streamline human-machine interaction in support of spatiotemporal EO big data analytics and SCBIR operations. EO information products generated as output by the closed-loop EO-IU4SQ system monotonically increase their value-added with closed-loop iterations

    Historical Document Digitization through Layout Analysis and Deep Content Classification

    Get PDF
    Document layout segmentation and recognition is an important task in the creation of digitized documents collections, especially when dealing with historical documents. This paper presents an hybrid approach to layout segmentation as well as a strategy to classify document regions, which is applied to the process of digitization of an historical encyclopedia. Our layout analysis method merges a classic top-down approach and a bottom-up classification process based on local geometrical features, while regions are classified by means of features extracted from a Convolutional Neural Network merged in a Random Forest classifier. Experiments are conducted on the first volume of the ``Enciclopedia Treccani'', a large dataset containing 999 manually annotated pages from the historical Italian encyclopedia

    Preliminary Assessment of HABIT for Children with Unilateral Cerebral Palsy Using Fidelity Measures

    Get PDF
    Purpose/ Hypothesis: The purpose of this study was to behaviorally code participants’ behaviors of a Hand Arm Bimanual Intensive Training (HABIT) camp. It was hypothesized the HABIT program would implement high levels of motor and social behaviors using behavioral coding as a measurement of fidelity. Number of Subjects: Five children (Mean age=8.8 years, SD=1.6 years), three females, diagnosed with unilateral cerebral palsy (CP), right-side impairment. Participants were classified as Manual Ability Classification System (MACS) levels I-III. Materials and Methods: The HABIT camp took place over a two-week period, ten days of intervention, four hours daily for a total of 40 hours. Oversight of daily intervention was directed by two therapists assisted by seven volunteers trained on HABIT key principles. A fidelity measurement was implemented to establish if participant behaviors were congruent with the intervention principles of HABIT through behavioral coding. Video footage was collected at random intervals throughout the intervention to measure the following behaviors’ duration: right/left contact, right/left object manipulation, tasks [i.e., therapist-provided activities that either do (complex tasks) or do not (simple tasks) cognitively challenge the subject], social engagement with peers, and focused attention (i.e., when subject focuses on an object while object exploration occurs). This preliminary report contains three random videos per participant, averaging approximately 30 minutes per video (total of 6.75 hours). Datavyu software was used to code behaviors [interrater reliability =85.4% 8.6]. The variables were summed as durations and normalized as percentages. Results: On average, the percentage of the duration of contacts was relatively equal between left (M= 63.8, SD=11.7) and right (M=46.1, SD=12.5) hands. The percent of the duration of object manipulation varied between the left (M=20.0, SD=10.9) and right (M=5.7, SD=5.8) hands. Children were engaged in simple tasks (e.g., playing with play dough) (M=34.9, SD=12.4) more often than complex tasks (e.g., target game) (M=20.7, SD=12.8), but varied by participant. Children were socially engaged with their peers (M=51.0, SD=12.9), alongside focused on an object while exploring it (M=34.8, SD=13.3). Conclusions: Both hands performed a similar duration of contact. Manipulations differed greatly between hands, favoring the unaffected left hand. It may be due to MACS classification systems and the use of their affected hand primarily for support. Simple tasks were performed more often than complex tasks, and social engagement with peers occurred most of the time. Clinical Relevance: This preliminary report of the 2022 HABIT camp suggests the intervention accomplishes its established high-intensity and engagement principles of intervention but may be limited to meeting challenging task goals. This study adds to existing research testing HABIT’s methodological approach to physical therapy intervention and to fidelity use in clinical settings relating to HABIT programming.https://digitalcommons.unmc.edu/surp2023/1008/thumbnail.jp

    Novelty Detection with Autoencoders for System Health Monitoring in Industrial Environments

    Get PDF
    Predictive Maintenance (PdM) is the newest strategy for maintenance management in industrial contexts. It aims to predict the occurrence of a failure to minimize unexpected downtimes and maximize the useful life of components. In data-driven approaches, PdM makes use of Machine Learning (ML) algorithms to extract relevant features from signals, identify and classify possible faults (diagnostics), and predict the components’ remaining useful life (prognostics). The major challenge lies in the high complexity of industrial plants, where both operational conditions change over time and a large number of unknown modes occur. A solution to this problem is offered by novelty detection, where a representation of the machinery normal operating state is learned and compared with online measurements to identify new operating conditions. In this paper, a systematic study of autoencoder-based methods for novelty detection is conducted. We introduce an architecture template, which includes a classification layer to detect and separate the operative conditions, and a localizer for identifying the most influencing signals. Four implementations, with different deep learning models, are described and used to evaluate the approach on data collected from a test rig. The evaluation shows the effectiveness of the architecture and that the autoencoders outperform the current baselines

    Visually Evoked Postural Responses (VEPRs) in Children with Vestibular Migraine

    Get PDF
    Vestibular migraine (VM) is the most common cause of episodic vertigo in children. Vertigo, nausea, dizziness and unsteadiness are often complained of by children with migraine, which can precede, follow or be present simultaneously with headache. The aim of this study was to use posturography to investigate the visually evoked postural responses (VEPRs) of children with VM and compare them to data obtained from children with primary headache (M) and controls (C). Twenty children diagnosed as affected by VM, nineteen children with M without aura and twenty healthy subjects were recruited in this cross-sectional study. Posturography was performed by a standardized stabilometric force-platform (Svep-Politecnica) in the following conditions: open eyes (OE), closed eyes (CE) and during full-field horizontal optokinetic stimulation (OKN-S). Electronystagmography was performed simultaneously to analyze optokinetic reflex parameters. In the OE condition, no difference was found between groups with respect to body sway area. In contrast, this parameter increased in the two pathological groups with respect to controls in the CE condition. The optokinetic stimulations also induced a similar increase of body sway area in the M group relative to controls, but a further increase was elicited in the VM group. Electronystagmographic recording also revealed different optokinetic reflex parameters in the latter groups. This study disclosed an abnormal sensitivity of children with M and VM to full-field moving scenes and a consequent destabilization of posture, as documented by the abnormal VEPRs. Children with VM were particularly exposed to this risk. Possible clinical implications of these findings are discusse

    Segment-based simple-connectivity measure design and implementation

    Get PDF
    In developing different measures for the description of a segment’s shape, we noted that it would be useful to include a measure capable of quantifying the presence of holes. This was motivated by the following scenario. The measures we use to characterize a segment’s shape, such as RoundnessAndNoHole (also known as compactness), ConvexityAndNoHole and RectangularityAndNoHole are monotonically decreasing with the presence of holes, namely: • RoundnessAndNoHole is high if Roundness is high and condition NoHole is true, • ConvexityAndNoHole is high if Convexity is high and condition NoHole is true and, finally, • RectangularityAndNoHole is high if Rectangularity is high and condition NoHole is true. For example, a region with a perfectly round external boundary, but containing several holes, will present a low RoundnessAndNoHole measure. Were the holes not present in the region, it would instead feature a very high RoundnessAndNoHole measure. Besides these measures, our newly introduced version of a measure of elongatedness is also affected by the presence of holes, increasing as the number of holes increases. In our study of satellite images, it is very common to find segments that contain holes, whether due to the underlying holes in the original observed structure or whether due to segmentation errors. In order to reason about these types of situations without having to change the definitions of the shape measures already in use (which are quite natural and intuitive), we introduce a new measure to quantify the presence of holes, which we call simple-connectivity. The simple-connectivity measure quantifies the extent to which a region is simply-connected, i.e., the measure should be monotonically decreasing with holes whose cardinality increases or whose size increases (at fixed cardinality).This work was supported in part by the National Aeronautics and Space Administration under Grant/Contract/Agreement No. NNX07AV19G issued through the Earth Science Division of the Science Mission Directorate
    • …
    corecore